Fast and accurate multi-class protein fold recognition with spatial sample kernels.
نویسندگان
چکیده
Establishing structural or functional relationship between sequences, for instance to infer the structural class of an unannotated protein, is a key task in biological sequence analysis. Recent computational methods such as profile and neighborhood mismatch kernels have shown very promising results for protein sequence classification, at the cost of high computational complexity. In this study we address the multi-class sequence classification problems using a class of string-based kernels, the sparse spatial sample kernels (SSSK), that are both biologically motivated and efficient to compute. The proposed methods can work with very large databases of protein sequences and show substantial improvements in computing time over the existing methods. Application of the SSSK to the multi-class protein prediction problems (fold recognition and remote homology detection) yields significantly better performance than existing state-of-the-art algorithms.
منابع مشابه
A fast, large-scale learning method for protein sequence classification
Motivation: Establishing structural and functional relationships between sequences in the presence of only the primary sequence information is a key task in biological sequence analysis. This ability can be critical for tasks such as making inferences of the structural class of unannotated proteins when no secondary or tertiary structure is available. Recent computational methods based on profi...
متن کاملA New Class of Spatial Covariance Functions Generated by Higher-order Kernels
Covariance functions and variograms play a fundamental role in exploratory analysis and statistical modelling of spatial and spatio-temporal datasets. In this paper, we construct a new class of spatial covariance functions using the Fourier transform of some higher-order kernels. Moreover, we extend this class of spatial covariance functions to the spatio-temporal setting using the idea used in...
متن کاملProbabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection
MOTIVATION The problems of protein fold recognition and remote homology detection have recently attracted a great deal of interest as they represent challenging multi-feature multi-class problems for which modern pattern recognition methods achieve only modest levels of performance. As with many pattern recognition problems, there are multiple feature spaces or groups of attributes available, s...
متن کاملLearning Large Margin First Order Decision Lists for Multi-Class Classification
Inductive Logic Programming (ILP) systems have been successfully applied to solve binary classification problems. It remains an open question how an accurate solution to a multi-class problem can be obtained by using a logic based learning method. In this paper we present a novel logic based approach to solve challenging multi-class classification problems. Our technique is based on the use of ...
متن کاملMulti-class protein fold recognition using support vector machines and neural networks
MOTIVATION Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system. RESULTS Most current discriminative methods for protein fold prediction use the one-against-others method, which has the well-known '...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational systems bioinformatics. Computational Systems Bioinformatics Conference
دوره 7 شماره
صفحات -
تاریخ انتشار 2008